108 research outputs found
Evoking agency: Attention model and behavior control in a robotic art installation
Robotic embodiments of artificial agents seem to reinstate a body-mind dualism as consequence of their technical implementation, but could this supposition be a misconception? The authors present their artistic, scientific and engineering work on a robotic installation, the Articulated Head, and its perception-action control system, the Thinking Head Attention Model and Behavioral System (THAMBS). The authors propose that agency emerges from the interplay of the robot’s behavior and the environment and that, in the system’s interaction with humans, it is to the same degree attributed to the robot as it is grounded in the robot’s actions: Agency cannot be instilled; it needs to be evoked
From Robot Arm to Intentional Agent: the Articulated Head
Robot arms have come a long way from the humble beginnings of the first Unimate robot at a General Motors plant installed to unload parts from a die-casting machine to the flexible and versatile tool ubiquitous and indispensable in many fields of industrial production nowadays. The other chapters of this book attest to the progress in the field and the plenitude of applications of robot arms. It is still fair, however, to say that currently industrial robot arms are primarily applied in continuously repeated manufacturing task for which they are pre-programmed. They are known for their precision and reliability but in general use only limited sensory input and the changes in the execution of their task due to varying environmental factors are minimal. If one was to compare a robot arm with an animal, even a very simple one, this property of robot arm applications would immediately stand out as one of the most striking differences. Living organisms must sense changes in the environment that are crucial to their survival and must have some flexibility to adjust their behaviour. In most robot arm contexts, such a comparison is currently at best of academic interest, though it might gain relevance very quickly in the future if robot arms are to be used to assist humans to a larger extend than at present. If robot arms will work in close proximity with and directly supporting humans in accomplishing a task, it becomes inevitable for the control system of the robot to have far reaching situational awareness and the capability to adjust its ‘behaviour’ according to the acquired situational information. In addition, robot perception and action have to conform a large degree to the expectations of the human co-worker
A system for video-based analysis of face motion during speech
During face-to-face interaction, facial motion conveys
information at various levels. These include a person's emotional
condition, position in a discourse, and, while speaking, phonetic
details about the speech sounds being produced. Trivially, the
measurement of face motion is a prerequisite for any further analysis
of its functional characteristics or information content. It is
possible to make precise measures of locations on the face using
systems that track the motion by means of active or passive markers
placed directly on the face. Such systems, however, have the
disadvantages of requiring specialised equipment, thus restricting the
use outside the lab, and being invasive in the sense that the markers
have to be attached to the subject's face.
To overcome these limitations we developed a video-based system to
measure face motion from standard video recordings by deforming the
surface of an ellipsoidal mesh fit to the face. The mesh is
initialised manually for a reference frame and then projected onto
subsequent video frames. Location changes (between successive frames)
for each mesh node are determined adaptively within a well-defined
area around each mesh node, using a two-dimensional cross-correlation
analysis on a two-dimensional wavelet transform of the
frames. Position parameters are propagated in three steps from a
coarser mesh and a correspondingly higher scale of the wavelet
transform to the final fine mesh and lower scale of the wavelet
transform. The sequential changes in position of the mesh nodes
represent the facial motion. The method takes advantage of inherent
constraints of the facial surfaces which distinguishes it from more
general image motion estimation methods and it returns measurement
points globally distributed over the facial surface contrary to
feature-based methods
Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information
Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility
Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks
The first generation of three-dimensional Electromagnetic Articulography
devices (Carstens AG500) suffered from occasional
critical tracking failures. Although now superseded by
new devices, the AG500 is still in use in many speech labs
and many valuable data sets exist. In this study we investigate
whether deep neural networks (DNNs) can learn the mapping
function from raw voltage amplitudes to sensor positions based
on a comprehensive movement data set. This is compared to
arriving sample by sample at individual position values via direct
optimisation as used in previous methods. We found that
with appropriate hyperparameter settings a DNN was able to
approximate the mapping function with good accuracy, leading
to a smaller error than the previous methods, but that the
DNN-based approach was not able to solve the tracking problem
completely
Analysis of tongue configuration in multi-speaker, multi-volume MRI data
MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach
Thinking head: Towards human centred robotics
Thinking Head project is a multidisciplinary approach to building intelligent agents for human machine interaction. The Thinking Head Framework evolved out of the Thinking Head Project and it facilitates loose coupling between various components and forms the central nerve system in a multimodal perception-action system. The paper presents the overall architecture, components and the attention system. The paper then concludes with a preliminary behavioral experiment that studies the intelligibility of the audiovisual speech output produced by the Embodied Conversational Agent (ECA) that is part of the system. These results provide the baseline for future evaluations of the system as the project progresses through multiple evaluate and refine cycles
Evaluation of the measurement precision in three-dimensional Electromagnetic Articulography (Carstens AG500)
Three-dimensional Electromagnetic Articulography (EMA) measures location and orientation of the moving speech articulators in real time by means of small, wired sensors. We evaluated the measurement accuracy of the Carstens AG500 EMA system using data acquired simultaneously with the Vicon optical motion tracking system (OPT). EMA sensors and OPT markers were combined in a single rigid object to be able to predict location and orientation of the EMA sensors from OPT motion tracking data. The error was computed as the root mean squared (RMS) error. We found that deviations from constant inter-sensor distances (relative error) were in general below 1 mm and 0.6° while the difference between the measured and estimated positions (absolute error) ranged between 1 and 2 mm and 0.5° and 0.7°. By examining error patterns, four critical orientation regions were detected, but no discernible location dependent error patterning. Sensor velocity appeared to have little impact. The RMS error of the original position calculation has not been found to be a reliable predictor. In the absence of a clear error structure we recommend careful analysis of unexpected findings in speech production data acquired with EMA. Avenues for further improvement of the system are discussed
Using sensor orientation information for computational head stabilisation in 3D Electromagnetic Articulography (EMA)
We propose a new simple algorithm to make use of the sensor orientation information in 3D Electromagnetic Articulography (EMA) for computational head stabilisation. The algorithm also provides a well-defined procedure in the case where only two sensors are available for head motion tracking and allows for the combining of position coordinates and orientation angles for head stabilisation with an equal weighting of each kind of information. An evaluation showed that the method using the orientation angles produced the most reliable results
Are there compensatory effects in natural speech?
This work exploited coarticulation and loud speech as natural sources of perturbation in order to determine whether articulatory covariation (motor equivalent behavior) can be observed inspeech that is not artificially perturbed. Articulatory analyses of jaw and tongue movement in the production of alveolar consonants by German speakers were performed. The sibilant /s/ shows virtually no articulatory covariation under the influence of natural perturbations, whereas other alveolar consonants show more obvious compensatory behavior. Our conclusion is that an effect of natural sources of perturbation is noticable, but sounds are affected to different degrees
- …